Mixture Models for Single-Cell Assays with Application to Vaccine 

Studies 



Greg Finak 1 , Andrew McDavid 1 , Pratip Chattopadhyay 3 , Maria Dominguez 3 , Steve De 
Rosa 1 ' 2 , Mario Roederer 3 , and Raphael Gottardo 1 

1 Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center 

(FHCRC), Seattle, WA 
2 HIV Vaccine Trials Network, Fred Hutchinson Cancer Research Center (FHCRC), Seattle, 
3 : WA 

3 Vaccine Research Center, NIAID, NIH, 40 Convent Drive, Rm 5509, Bethesda, MD 20892 

August 31, 2012 



Abstract 

Blood and tissue are composed of many functionally distinct cell subsets. In immunological 
studies, these can only be measured accurately using single-cell assays. The characterization of 
these small cell subsets is crucial to decipher system level biological changes. For this reason, 
an increasing number of studies rely on assays that provide single-cell measurements of multiple 
genes and proteins from bulk cell samples. A common problem in the analysis of such data 
is to identify biomarkers (or combinations of biomarkers) that are differentially expressed be- 
tween two biological conditions (e.g., before/after vaccination), where expression is defined as 
the proportion of cells expressing that biomarker (or biomarker combination) in the cell sub- 
set(s) of interest. Here, we present a Bayesian hierarchical framework based on a beta-binomial 
mixture model for testing for differential biomarker expression using single-cell assays. Our 
model allows the inference to be subject specific, as is typically required when accessing vac- 
cine responses, while borrowing strength across subjects through common prior distributions. 
We propose two approaches for parameter estimation: an empirical-Bayes approach using an 
Expectation-Maximization algorithm and a fully Bayesian one based on a Markov chain Monte 
Carlo algorithm. We compare our method against frequentist approaches for single-cell assays 
including Fisher's exact test, a likelihood ratio test, and basic log-fold changes. Using several 
experimental assays measuring proteins or genes at the single-cell level and simulated data, we 
show that our method has higher sensitivity and specificity than alternative methods. Addi- 
tional simulations show that our framework is also robust to model misspecification. Finally, we 
also demonstrate how our approach can be extended to testing multivariate differential expres- 
sion across multiple biomarker combinations using a Dirichlet-multinomial model and illustrate 
this multivariate approach using single-cell gene expression data and simulations. 
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1 Introduction 



Cell populations, particularly in the immune system, are never truly homogeneous; individual cells 
may be in different biochemical states that define functional but measurable differences between 
them. This single-cell heterogeneity is informative, but lost in assays that measure cell mixtures. 
For this reason, endpoints in vaccine and immunological studies are measured through a variety of 
assays that provide single-cell measurements of multiple genes and proteins. In the 1970s, single- 
cell analysis was revolutionized with the development of fluorescence-based flow cytometry (FCM) . 
Since then, instrumentation and reagent advances have enabled the study of numerous cellular pro- 
cesses via the simultaneous single-cell measurement of multiple surface and intracellular biomarkers 
(up to 17 biomarkers). More recent technological development have drastically extended the ca- 
pabilities of single-cell cytomet ry to measure dozen s of simultaneous parameters (i.e. proteins, 
genes, cytokines, etc.) per cell ( Bendall et al. . 201 ll ). Although cells sorted using well-established 
surface biomarkers m ay appear homogeneous, mRNA express ion of other genes within these cells 



can be heterogeneous (jNarsinh et all l201ll . iFlatz et al.l . l201ll ) and could further characterize and 
subset these cells. A new technology based on microfluidic arrays combined with multiplexed 
polymerase chain reactions (PCR) can now be used to perform thousands of PCRs in a single 
device, enabling simultaneous, high-thro ughput g e ne ex pression measurements at the single-cell 
level across hundreds of cells and genes (jPieprzvki . l200Sh . While classic gene expression microar- 
rays sum the expression from many individual cells, the intrinsic stochas tic nature of biochemica l 
processes results in relatively large cell-to-cell gene expression variability ( van Qudenaarden , 20091 ) . 
This heterogeneity may carry important information, thus single-cell expression data should not 
be analyzed in the same fashion as cell-population level data. Special treatment of single-cell level 
data, which preserves information about population heterogeneity, is warranted in general. For 
this reason, single-cell assays are an important tool in immunology, providing a functional and 
phenotypic snapshot of the immune system at a given time. These assays typically measure multi- 
ple biomarkers simultaneously on individual cells in a heterogeneous mixture such as whole blood 
or peripheral blood mononuclear cells (PBMC), and are used for immune monitoring of disease , 



vaccine research, and diagn osis of haematological malignancies (lAltman et all Il996l . iBetts et al. 



20061 . Ilnokuma et all . 120071 ) . 



During analysis, cell level biomarker fluorescence intensities are typically thresholded as positive 
or negative so that subsets with different multivariate +/— combinations can be obtained as Boolean 
combinations. For some assays (e.g., flow cytometry), the positivity thresholds are set based on 
prior biological knowledge while for others, thresholds are given by the assay technology. This is 
the case for the Fluidigm technology where genes are recorded as absent (not expressed) or present 
(expressed) at the single-cell level. After this thresholding step, we obtain a Boolean matrix of 
dimension N x K, where N is the number of cells recorded and K is number of biomarkers. Using 
this matrix, one can form 2 K putative cell subsets obtained as Boolean combinations. When K 
is large there is a combinatorial explosion of the number of subsets, and many of these might 
be small or even empty. A common statistical problem is, for a given biomarker combination, 
to identify subjects for whom the proportion of cells expressing that combination is significantly 
different between two experimental conditions (e.g., before and after vaccination). Note that we 
use the term 'subject' throughout the paper, but the approaches described are general and can be 
applied to other experimental units (e.g., animal studies). 

A motivating example from vaccine research is the flow cytometric intracellular cytokine staining 
(ICS) assay, which is used to identify and quantify subjects' immune responses to a vaccine. Upon 
vaccination, antigen in the vaccine is taken up and presented to CD4 or CD8 T-cells via antigen 
presenting cells. While not all T-cells can recognize all antigens, those that recognize antigens in 
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the vaccine become activated and produce a variety of cytokines, further promoting the immune 
response. After activation, this antigen-specific subpopulation proliferates and can persist in the 
immune system for s ome time providing me mory that can more rapidly recognize the same antigen 
again in the future ( McKinstry et al. . 2O10l ). The antigen-specific T-cell subpopulations (i.e. the 
subset that can respond to one specific antigen) constitute a very small fraction of the total number 
of CD4 and CD8 T-cells. The ICS assay measures the number of antigen-specific T-cells in PBMC 
or whole blood by measuring cytokine production in response to activation following stimulation 
by an antigen that closely matches what was present in the original vaccine. Individual cells are 
labelled using fluorescently conjugated antibodies against phenotypic biomarkers (CD3, CD4, and 
CD8), used to subset T-cells, and functional biomarkers (cytokines ) used to define antigen specific 



T-cells (jHorton et all 120071 . [Pe Rosa et all 120041 . iBetts et alllioOfih . A sufficiently large number of 
cells must be collected to ensure that the rare cell populations can be detected. Subsequently, each 
individual cell is classified as either positive or negative for each maker based on predetermined 
thresholds, then the number of cells matching each subpopulation phenotype is counted. 

These counts are compared between antigen stimulated and unstimulated samples from a subject 
to identify significant differences. Subjects who generate a response after stimulation are called 
responders, whereas subjects that do not show any differences are called non-responders. In many 
immunological studies, the size of the functionally distinct subpopulations (i.e., the number of 
positive cells) is very low (relative to the total number of cells), and real biological differences 
might be difficult to detect. 

Although there is no standard a pproach to ana l yzing ICS assays, current methods r ange from ad- 
hoc r ules based on log- fold changes (jTrigona et al.l . l2003h . to non — parametric m ethods ([Sinclair et al. 



2004), to permutation tests based on Hotelling's T 2 stat istics (INasonl. 120061) , to exact tests of 2x2 



contingency tables (e.g., Fisher's exact t est and x 2 test) ( Horton et all 120071 . IProschan and Nasonl . 



20091 . IPeioerl et all l20ld . INasonl . l2006h . All of these methods test subjects separately, and no 



information is shared across observations even though one could expect some similarities across 
responders (or non-responders). 

The framework developed in this paper, named MIMOSA (Mixture Models for Single Cell 
Assays), addresses these issues explicitly. In our model, cell counts are modelled by a binomial 
(or multinomial in the multivariate case) distribution and information is shared across subjects by 
means of a prior distribution placed on the unknown proportion(s) of the binomial (or multinomial) 
likelihood. In order to discriminate between responders and non-responders, the prior is written 
as a mixture of two beta (or Dirichlet in the multivariate case) distributions where the hyper- 
parameters for each mixture component are shared across subjects. This sharing of information 
helps regularize proportion estimates when the cell counts are small, which is typical with single-cell 
assays, and increases sensitivity and specificity when detecting responders. Because our framework 
is multivariate in nature, multiple cell subsets can be modelled simultaneousl y, which coul d help 
detect small biological changes that are spread out across multiple cell subsets (Nason, 20061 ). Our 
paper is organized as follows; Section [2] introduces the data and notations used in the paper. In 
Section O we present our model for testing differential biomarker expression in the univariate case. 
Section |4] compares our approach to alternative methods and tests the robustness of our model. In 
Section [5] we present a multivariate extension of our model that can be used to test multivariate 
biomarker differential expression and present some results using a single-cell gene expression data. 
Finally, in Section [6] we discuss our findings and future work. 
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2 Notation and Data 



In the remainder of the paper, we use the following notation to describe our model. We assume that 
we observe cell counts from / subjects in two conditions: stimulated and un-stimulated. Each cell 
can either be positive or negative for a biomarker. Given a set of K biomarkers, the measured cells 
can be classified into 2 K positive/negative biomarker combinations. We denote by n s ik and n U ik, 
k = 1, . . . , 2 K , i = 1, . . . , I, the observed counts for the 2 K combinations in the stimulated and un- 
stimulated samples, respectively. We denote by N s i = and N U i = n uik the total number 
of cells measured for subject i in each sample, respectively. For ease of notation, we denote by 
the vector of observed counts for subject i, i.e., yj = (n S j, n U j) where n s j = {n s ik : k = 1, . . . , 2 K } 
and n ui = {n uik : k = 1, . . . , 2 K }. Finally, we define y = (yi, . . . , yj). 

We consider two types of immunological single-cell assays: flow cytometry and single-cell gene 
expression, as described below. 

Flow cytometry: The primary dataset used here is an ICS data set generated as part of a trial 
testing the GeoVax DNA and MVA (Modified Vaccinia Ankara) HIV va ccine in a prime-boos t 



regimen (prime at zero and two months, boost at four and six months) (jGoepfert et all 12011 



The goal of this data set was to assess the immune response to the vaccine across multiple antigen 
stimulations, time points, cytokines and T-cell subsets. Here, we analyze a subset of the data 
consisting of 98 subject from the vaccine group at two time points: day and day 182. Three 
cytokines (IFN-7, TNFa and IL2) were measured at the single-cell level for each subject and time 
point, with and without stimulations with an antigen (here we focus on HIV Envelope peptide 
pool) matching part of the vaccine. For ease of presentation we restricted ourselves to the CD4+ 
T-cell subsets. Samples on day were taken just before vaccination and no response is expected 
there. The corresponding samples can be used as negative controls. Conversely, day 182 (26 weeks) 
should be close to the immunogenicity peak, and many subjects are expected to respond, for some 
cytokines at least. 

Fluidigm single- cell gene expression: This is a single-cell gene expression data set of sorted 
CD8+ T-cells from sixteen subjects. T-cells isolated by flow cytometry from sixteen subjects were 
stimulated in blocks of four subjects with four different antigens (HIV Gag, HIV Nef, CMV pp65 
tmlO, and CMV pp65 nlv5) and gene expression post-stimulation measured at the single-cell level 
using the BioMark system (Fluidigm) 96 x 96 well arrays. The expression from the simulated 
samples was compared to paired, unstimulated controls. 

Although the immunological experiments described above will often look at multiple antigens 
or stimulations, in the models presented here we consider only one stimulation (i.e. antigen or 
condition) at a time vs. unstimulated. The issue of multiple antigens is handled through multiple 
testing correction. 



3 Differential expression with one biomarker 

Datasets like the ones presented here are usually analyzed in a univariate fashion to avoid being 
underpowered due to the large number of combinations and the potential for very small cell counts 
in many of the combinations. Here, by univariate, we mean that we have only one positive cell 
subset. This cell subset can be defined by considering the expression of one biomarker alone 
(marginalizing over all other measured biomarkers) such as A+ (vs. A—), or considering a specific 
positive biomarker combination (and marginalizing over everything else) such as A+ and/or B+ 
(vs. A— /B— ). Without loss of generality, we treat the univariate case as a one biomarker case 
(i.e., K = 1). In this case, for a given subject, the data can be summarized in a contingency table 
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Table 1: 2 x 2 contingency table of counts for biomarker positive and negative 
cells between stimulated (s) and unstimulated (u) conditions for a given subject 
i. 







Biomarker 






$ative Positive 


Stimulated 


N si 




Unstimulated 


N ui 





of +/— cell counts across the un-stimulated and stimulated samples as depicted in Table [TJ 

For a given subject and stimulation, we consider a biomarker to be differentially expressed if the 
proportion of positive cells in the stimulated samples is different from the number of positive cells 
in the un-stimulated sample. Subjects that show differential expression will be called responders 
for that biomarker. In this section, we are concerned with identifying differential expression one 
biomarker at a time, using a beta-binomial mixture model as described in what follows. 



3.1 Beta-binomial model 

For a given subject i, the positive cell counts for the stimulated and un-stimulated samples are 
jointly modeled as follows: 

\psi) ~ B'm(N si ,p si ) and ( 

Tlui \Pui 

) ~ Bm(N ui ,p ui ) 

where p s i, p u i are the unknown proportions for the stimulated and un-stimulated paired samples, 
respectively. In order to detect responding subjects, we consider two competing models: 

M ■ Pui = Psi and Mi : p u i ¥= Psi- 

Under the null model, A4o, there is no difference between the stimulated and un-stimulated samples, 
and the proportions are equal (yet the cell counts can differ). Under the alternative model, A4\, 
there is a difference in proportions between the two samples and the subject i is a responder. In 
some studies, such as the ICS data used here, the proportion of positive cells is expected to only 
increase after stimulation, in which case the alternative model should be defined as p s > p u . This 
alternative parametrization is described in Web Appendix B, and we refer to it as the one-sided 
model. 



3.2 Priors 

Our model shares information across all subjects using exchangeable ( Bernardo! 19961 ) Beta priors 
on the unknown proportions, as follows: 

(Pui\zi = 0) ~ Beta(a u ,f3 u ) 

(Psi\zi = 1) ~ Beta(a s ,/3 S ) and (p u i\zi = 1) ~ Beta(a M , /3 U ), 

where %i is an indicator variable equal to one if subject i is a responder, i.e., M.\ is true, and 
zero otherwise, and a u , f3 u , a s , f3 s are unknown hyper-parameters shared across all subjects. Note 
that the parameters a u and j3 u are explicitly shared across the two models, whereas a s and j3 s 
are only present in the alternative model. Finally, we assume that Zi ~ Be(w;) are independent 
draws from a Bernoulli distribution with probability w, where w represents the (unknown) propor- 
tion of responders. It follows that marginally, i.e., after integrating Zj, the p u i and p s i are then 
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jointly distributed as a mixture of a one dimensional Beta distribution and a product of two Beta 
distributions (with a possible constraint), with mixing parameter w. Treating the zfs as missing 
data, the unknown parameter vector 6 = (a u , /3 U , a s , w) can be es t imate d in an Empirical-Bayes 



fashion using Expectation-Maximization algorithm ([Dempster et all 119771 ) as described in Section 
13.31 As an alternative, we also describe a fully Bayesian model, where the hyperparameters a u ,f3 u , 
a s , and (3 S are each given vague exponential priors with mean 10 3 , and w is assumed to be drawn 
from a uniform distribution between and 1. In this case, all parameters will be estimated via a 
Markov chain Monte Carlo algorithm as described in Section 13.31 

3.3 Parameter estimation 

In our proposed EM and MCMC algorithms, we greatly simplify our calculations by directly uti- 
lizing the marginal likelihoods, Lo and Li, obtained after marginalizing p s i and p u i from the null 
and alternative likelihoods. Given the conjugacy of the priors, the marginal likelihoods Lo and Li 
are available in closed- forms (Web Appendix A), and are given by, 



L (a u ,(3 u \yi 



N ui \ (N s 

B(n si + n ui + a u , N si - n si + N ui - n ui + f3 u ) 



B(a u ,f3 u ) 



and 

Li(a u , A,,as,/3s|yi 



N„A (N si 



B(n ui + a u , N ui - n ui + /3 U ) 

B(a u ,(3 u ) 
B(n s j + a s , N si - n s j + j3 s ) 

B(a s ,(3 s ) 



(1) 



Above, B is the Beta function. Assuming that the missing dcitci, 2^, % — 1, . . . . I, are known, we 
define the complete data log-likelihood: 

/(0|y,z) = S~] Zik(a u , ^u\yi) + (1 - Zi)h(a u , /3 u ,a s , f3 s \yi)+ 

(2) 

Zilog(w) + (1 - 2j)log(l - w), 

where lo and l\ are the log marginal-likelihoods and 6 = (a u , (3 U , a s , f3 s , w) is the vector of param- 
eters to be estimated. In the one-sided case, the alternative prior specification must satisfy the 
constraint p s > p u , and the marginal likelihood derivation involves the calculation of a normalizing 
constant that is not available in closed- form but can easily be estimated. All calculations for the 
one-sided case are described in Web Appendix B. 
EM algorithm 

Given an estimate of the model parameter vector = |a M , /3 U , a s , /3 S , u)j and the data y, the E 
step consists of calculating the posterior probabilities of differential expression, defined by 

Zi = Bx{ Zi = l|y,0) = 

w ■ Li(a n ,/3 u ,a s ,/3 s , |yj) 



(1 -w) ■ L (a u , Putyi) + w ■ Li(a u ,f3 u ,a s ,f3 s \yi) 
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The M-step then consist of optimizing the complete-data log-likelihood over 6 after replacing Z{ 
by 2j in (|2J)- Straightforward calculations lead to w = Yli^i/^-i but unfortunately no closed form 
solutions exist for the remaining parameters. We use nu merical optimization as imple mented in 
R's optim function to estimate the remaining parameters ( Ihaka and Gentleman . 19961 ) . Starting 
from some initial values, the EM algorithm iterates between the E and M steps until convergence. 
In our case, we initialize the Zi's using Fisher's exact test to assign each observation to either the 
null or alternative model components. We then use the estimated Zj's to estimate the p u iS and 
PsiS and use these to set the hyper-parameters to their method-of-moments estimates. 
MCMC algorithm 

We generated realizations from the posterior distribution via MCMC algorithms (jCelfandl . fl99fil ) . 
All updates were done via Metropolis-Hastings sampling except for the z^s and w that were per- 
formed via Gi bbs samplings. Det a ils ab out t he algorithms are given in Web Appendix A. We used 
the method of Raftery and Lewis ( 19921 ) and Raftery ( 19961 ) to determine the number of iterations, 
based on a short pilot run of the sampler. For each dataset presented here, we calculated that 
no more than about 1,000,000 iterations with 50,000 burn-in iterations was sufficient to estimate 
standard posterior quantities. To leave some margin, we used 2,000,000 iterations after 50,000 
burn-in iterations for each dataset explored here. 



4 Results 

In this section, we apply our MIMOSA model to the data described in Section [21 and present the 
results of a simulation study based on the ICS data. We evaluated and compared the performance of 
MIMOSA against Fisher's exact test, the likelihood ratio test, and log fold-change by ROC (receiver 
operator characteristic) curve analysis and by comparing the observed FD R (false disco very rate) 



against the nominal FDR (expected false discovery rate) for each data set ([Storeyj, [2002J) , where a 
false discovery (for the ICS data) is a day sample (non-responder) that is incorrectly identified 
as a responder by the model (or a competing method). 

4.1 ICS 

Using the ICS data, we performed an ROC (receiver operator characteristic) analysis to assess 
the sensitivity and specificity of the one-sided MIMOSA model compared to a one-sided Fisher's 
exact test, log fold-change, and a likelihood ratio test based on the MIMOSA model for identifying 
vaccine responders and non-responders. We considered observations at the day time point as true 
negatives, and observations at the day 182 time point as true positives (potentially underestimating 
the sensitivity of all methods considered here due to real non-responders at day 182 being treated 
as true positives). The MIMOSA model has higher sensitivity and specificity than Fisher's exact 
test, the likelihood ratio test, or log fold-change for discriminating vaccine responders and non- 
responders as shown by the ROC curves on Figure [H panels A,C,E. At an FDR between 10-20%, 
MIMOSA would lead to about 20% more true positives being detected. Our comparisons also show 
that ranking based on log-fold change alone is not reliable and should not be used. In addition, 
MIMOSA gave estimates of the observed false discovery rate that are better or comparable to 
competing methods (Figure [H panels B,D,F). Here we present the results b ased on IL2 and IFN- 7 



alone and the subset IL2 and/or IFN-7 that were used in the original study (IGoepfert et all 12011 



These results are consistent for other cytokines and cytokine combinations (see Web Figure A). 
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Figure 1: Performance of MIMOSA (EM and MCMC implementations, one-sided model) and com- 
peting methods on ICS data from the example flow cytometry data set. Sensitivity and specificity 
(ROC analysis) as well as observed and nominal false discovery rates for positivity calls from CD4+ 
T-cells stimulated with A-B) ENV-1-PTEG and expressing IFN-7 or C-D) ENV-1-PTEG and ex- 
pressing IL2. E-F) ENV-1-PTEG and expressing IFN-7 and/or IL2. ROC and FDR plots of other 
cytokine combinations can be found in Web Figure A. This figure appears in color in the electronic 
version of this article. 8 



4.2 Single-cell gene expression 

We applied the MIMOSA model to a Fluidigm single-cell gene expression data set. We used the two- 
sided MIMOSA model because genes could be regulated upward or downward upon stimulation. In 
order to detect stimulation specific changes of expression, we fit our model to each gene within each 
stimulation. The results presented in Figure [2] show that MIMOSA identifies stimulation-specific 
differences in the proportions of cells expressing each gene while preserving inter-subject variability 
(Figure [2] A, B). These patterns are evident in the posterior probabilities (Figure [2] A) and preserved 
in the posterior estimates of the differences of proportions (Figure [2] B). A similar analysis using 
a two-sided Fisher's exact test and clustering the signed FDR adjust p-values (Figure [2] C) does 
not reveal any stimulation-specific patterns. For an FDR of 10%, Fisher's exact test identified 47 
significant genes, while MIMOSA identified 50 significant genes. Both methods identified 39 genes 
in common. 

4.3 Simulation Studies 

We examined the performance of the constrained (p s > p u ) and unconstrained (p s ^ p u ) beta- 
binomial mixture models via simulations. For the simulation, we used hyper-parameters estimated 
from a one-sided MIMOSA model fit to ICS data (IL2 univariate) from the primary immunogenicity 
time point. We simulated data from this constrained model with 200 observations, a response rate 
of 60%, N = 1,000, 5,000, and 10,000 events, with ten independent realizations of data for each 
N. We fit the one-sided MIMOSA model to this data. We evaluated the sensitivity and specificity 
of the model's ability to correctly identify observations from the "responder" and "non-responder" 
groups through analysis of ROC curves, and compared against Fisher's exact test, the likelihood 
ratio test, and log fold-change. We repeated this procedure for the two-sided models fit to two-sided 
data (Figure [3] A, C). In addition, we examined the nominal vs. observed FDR to assess the ability 
of each method to properly estimate the FDR (Figure [3] B,D). 

For both the constrained and unconstrained simulations, MIMOSA was superior to competing 
methods, including Fisher's exact test, with respect to sensitivity and specificity even at small 
values of N (Figure [3] A and C and Web Figure B, panel E). Additionally, the estimated FDR for 
MIMOSA more closely reflected the nominal FDR compared to Fisher's exact test and competing 
methods (Figure [31 panels B, D, and Web Figure B panel F). 

To assess the sensitivity of the model to deviations from model assumptions, we repeated the 
simulations with the cell proportions drawn from truncated normal distributions with support (0, 1), 
rather than beta distributions. The means and variances of the truncated normal distributions were 
set to the maximum likelihood estimates of the beta distributions defined by the hyper-parameters 
a and j3 estimated from the ICS data set (see Web Figure B panels C and D). Even under these 
departures from the model assumptions, the unconstrained MIMOSA model outperformed Fisher's 
exact test. 

5 Differential expression across biomarker combinations 

Our beta-binomial model described in Section 13.11 can be generalized to a Dirichlet-multinomial 
model to assess differential expression across multiple biomarker combinations. As described in the 
data section, we now have counts for each biomarker combination, denoted by n s j = {n s ik : k = 
I,..., 2 K } and n ui = {n uik :k = l,..., 2 K }. 
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Figure 2: Signed posterior probability, difference and log-odds ratio of the proportion of single-cells 
expressing each gene on a 96x96 Fluidigm array. The posterior probability of response times the 
sign of the change in expression is shown in A) (red indicates a decrease, green an increase, relative 
to the control). Columns and rows are clustered based on these signed posterior probabilities. B) 
The posterior differences in proportion of cells expressing a gene in the stimulated vs. control 
samples. Rows and columns are ordered as in A) for comparison. The traces show the deviations 
of each cell from zero. Colors along the columns denote different stimulations (green: CMV pp65 
nlv5, red: HIV Gag, orange: HIV Nef, yellow: CMV pp65 tmlO). C) Clustering of the signed 
q-values from Fisher's exact test. Genes selected from Fisher's exact test at the 10% FDR level. 
This figure appears in color in the electronic version of this article. 
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Figure 3: Comparison of positivity detection methods on data simulated from the one-sided and 
two-sided models. Ten simulations were generated at an N of 5,000 total counts using hyper- 
parameter estimates from real ICS data (IFN-7 expressing CD4+ T-cells stimulated with ENV-1- 
PTEG from HVTN065) with a five-fold effect size between responder and non-responder compo- 
nents. A) Average ROC curve over the 10 simulated data sets (N=5,000), one-sided B) Average 
observed and nominal false discovery rate over 10 simulated data sets (N=5,000), one-sided. C) 
Average ROC curves, two-sided model. D) Average observed and nominal FDR, two-sided model. 
Curves are shown for MIMOSA, Fisher's exact test, the likelihood ratio test, and log fold-change. 
Results for MIMOSA fit to a model violating model assumptions, as well as other values of N are 
in Web Figure B. This figure appears in color in the electronic version of this article. 
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5.1 Model 



In our multivariate model, the beta distribution is replaced by a multinomial distribution, as follows: 
(n ui |p„i, ) ~ M{N ui , p ui ) and (n si |p si ) ~ M(N si , p si ) 

where Nt SiU -u = Yl n {s.u}ik are the number of cells collected and p u i and p s i are the unknown 
fc=i 

proportions for the un-stimulated and stimulated samples, respectively. 
5.2 Prior 

As in the one-biomarker case, we share information across subjects using an exchangeable prior on 
the unknown proportions. This time the beta priors are replaced by Dirichlet priors, such that 

(p ui \zi = 0) ~ Dir(a M ), 

(Pui\zi = 1) ~ Dir(a M ) and (p s i\zi = 1) ~ Dir(a s ), 



where the indicator variable Z\ is defined in Section [3.21 i.e., Z\ ~ Be(u>). As in the beta-binomial 
case, both an EM and MCMC algorithms can be used for parameter estimation. When using a fully 
Bayesian approach via MCMC, we use the same priors for ol^ u s ^ and w as for the beta-binomial 
model. 

5.3 Parameter estimation 

Again, to simplify the estimation problem, we make use of the marginal likelihoods that can be 
obtained in closed forms (see Web Appendix C). For the null component, the marginal likelihood 
L is given by, 

j i | \ B(o! u + n ui + n si ) N si \ N ui \ 

MH^u n si, n ui) — — 



where B is the 2- f!r -dimensional Beta function defined as B(a) = \\ k r(afc)/r(^ fe a^). Similarly 
the marginal likelihood for the alternative model is given by 

)B(a s + n s i) N s il N U {\ 

Li(a u ,a s i\n s i,n ui ) = - 



B(a s )B(a u ) ]J k n sik \ \[ k n uik \ ' 

The estimation procedures (both EM and MCMC based) for the Dirichlet-multinomial distribution 
are the same as for the beta-binomial model except that the number of parameters to estimate is 
larger. We initialize the in the EM algorithm with the positivity calls from the multivariate 
Fisher's exact test. In our experience, the performance of the EM algorithm greatly deteriorates 
for K > 3, and is more dependent on the initial values and can fail to converge in many instances. 
Although our MCMC algorithm is slightly more computational, it does not suffer from this problem 
and provides a robust alternative when K is large. More details about our multivariate MCMC 
algorithm is given in Web Appendix C. 

5.4 Poly functionality in Fluidigm Single-Cell Gene Expression Data 

As a proof-of-concept, we applied our multivariate MIMOSA model for two specific genes in the 
Fluidigm data, namely BIRC3 and CCL5. For this example, K = 2, and we have four possible 
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Figure 4: Counts of cells expressing different combinations of BIRC3 and CCL5 genes in the A) 
unstimulated and B) stimulated conditions. No difference is observed from the marginalized counts, 
while multivariate MIMOSA detects a difference between stimulated and unstimulated conditions 
in 13 of 16 samples. Sample names highlighted in red identify those where MIMOSA did not detect 
a difference. This figure appears in color in the electronic version of this article. 

combinations. In Figure H] we show heatmaps of the counts of cells expressing all combinations of 
the BIRC3 and CCL5 genes in unstimulated and stimulated samples (Figure U] A, B). Only CCL5 
positive cells express BIRC3, and its expression increases upon stimulation. The typical approach 
to analyzing poly-functional populations from intracellular cytokine staining data (summing the 
counts over all possible polyfunctional cell populations as in IL2+ and/or IFN-7+) would not be 
appropriate in this case, since changes in the counts of these different cell populations occur in 
both directions. That is, the number of BIRC3-/CCL5+ cells decreases upon stimulation, while 
the number of BIRC3+/CCL5+ cells increases. When marginalizing over these cell populations, 
no difference is apparent in any of the samples. In contrast, the multivariate MIMOSA model tests 
all polyfunctional cell subpopulations simultaneously, and detects significant differences between 
stimulated and unstimulated conditions in 13 of the 16 samples (Figure H]D, black labels). Testing 
all combinations simultaneously is an advantage over performing multiple univariate tests on the 
subject combinations, which requires multiplicity adjustment and a potential loss of power. 

Since the Fluidigm data set has a limited number of observations (100 cells and 16 samples), we 
could not look at more than two biomarkers at once. Therefore, we performed simulations in eight 
dimensions to assess the power of the multivariate MIMOSA model compared to Fisher's exact test 
and the likelihood ratio test on the resulting 2x8 tables (Figure [5] A-C). These results show that 
multivariate MIMOSA has significantly increased power to detect true differences in multivariate 
data, even with small counts and small effect sizes, and the model better fits the data than the 
competing standard approaches tested (Figure [5] B). 



6 Discussion 



Experimentalists already have access to a myriad of single-cell assays such as flow cytometry, mass 
cytometry and multiplexed quantitative-PCR, to name a few. As sin gle-cell assays become even 



more routine once sequencing at the single-cell level becomes practical (jRamskold et all 12012 ). the 



development of effective statistical methods to detect differences in gene or protein expression at 
the single-cell level is becoming increasingly important. Current approaches for single-cell assays 
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Figure 5: Multivariate simulations from a two-sided model. Ten, eight-dimensional data sets were 
simulated from a two-sided model with an effect sizes of 2.5 x 10 -3 and —2.5 x 10~ 3 in two of the 
eight dimensions (N=l,500). Multivariate MIMOSA was compared against Fisher's exact test, and 
the likelihood ratio test. A) Average ROC curves for the competing methods over 10 simulations. 
B) Average observed and nominal false discovery rate for each method over 10 simulations. This 
figure appears in color in the electronic version of this article. 



are for the most part simplistic such as the t-test, x 2 test, and Fisher's exact test, and resulting 
inference can be quite sub-optimal, especially when the cell counts are small. Most importantly, 
these methods do not share information across samples, resulting in less power to detect true 
differences than empirical-B ayes and hierarchical modeling approaches, which are widely app lied 



in the microarray literature (jKendziorski et al.1 . 120031 . iNewton et al.1 . l200ll . I Smyth et all 120051 ) . In 



addition, most of these methods are univariate in nature and inappropriate for high-dimensional, 
next-generation single-cell assays. 

The MIMOSA model presented here uses a mixture model framework of beta-binomial or 
Dirichlet-multinomial distributions to model counts in experimental subjects across multiple con- 
ditions {i.e., vaccine responders and non-responders). Information is shared across responders and 
non-responders through exchangeable beta or Dirichlet priors, increasing the power to detect true 
differences between treatment and control conditions compared to Fisher's exact test, even when 
the underlying model assumptions are violated (Figures [3] and Web Figure B). The univariate 
MIMOSA model based on the Beta-Binomial distribution allows us to constrain the alternative 
hypothesis to the case p s > p u , where the proportion of cells in the stimulated sample is strictly 
greater than the proportion of cells in the matched unstimulated sample. This has proven to be 
useful for the ICS data where stimulation induced changes are expected to be one-sided. 

Although we used two single-cell assay platforms as motivating examples, our MIMOSA model 
can be applied to any type of single-cell assay where cells are dichotomized into positive and negative 
sets, counted and compared across different conditions. In the case of the Fluidigm data, most 
analysis methods have been focused on identifying differences in the continuous part of the signal 
ignoring cells that are undetected (i.e., the gene is not expressed in the cell), or the information is 



used for pre- filtering (IFlatz et all l201ll ). The ability of MIMOSA to identify stimulation-specific 



expression patterns in single-cell gene expression data demonstrates not only the broader utility of 
the method, but importantly, also demonstrates that biologically relevant signal is present in the 
proportion of cells expressing each gene under different conditions (Figure [2] A-C). 

Detecting differences in poly- functional cell populations (i.e., identifying changes in cell popula- 
tions that co-express multiple proteins, cytokines, or genes) is important in immunolo gy, since it al- 



lows the identification of more precisely defined, more homogeneous cell populations (jMilush et al. 
20091 ). In the context of HIV, poly-functional cell populations have been shown to be correlated 



14 



with long-term disease non-progression, while in the context of vaccination studies (e .g. in Leish- 



manial poly-functional r e sponses have been co rrelated with protection from disease (jBetts et al 



2006, iDarrah et all 120071 . iPrecopio et al.1 . 120071 ). In the ICS data used here, the stimulation is ex- 



pected to increase only the number of antigen specific cells detected. Hence, if a specific cell subset 
expressing multiple biomarkers is being differentially expressed, differential expression based on the 
marginal cell counts should also be detected. As such, identifying poly-functional cytokine profiles 
from ICS data can be done in an iterative way. First, univariate tests on marginal populations are 
performed, and then specific cell subsets expressing the positive biomarkers detected are tested. 
However, this iterative (univariate) approach might not be satisfactory due to the large number 
of possible combinations that need to be tested, and a multivariate approach might be preferable. 
In that others have pointed out, in order to have the most power to detect a true differ- 

ence, th e statistical te st should be selected taking into account only the cytokine combinations of 
interest (INason . 

For two-sided changes, as with the Fluidigm data, changes in poly-functional cell populations 
are not always detectable when looking at the marginal populations (Figure H] A-C). In this case, 
the use of multivariate model, as our Dirichlet-multinomial model, will become important to detect 
differential biomarker expression. Here, we have shown that MIMOSA has higher sensitivity and 
specificity than the competing methods to identify true differences between conditions in multi- 
variate count data (Figure H] A, and Figure [5] A,C), and the model generally provides a better fit 
to the single-cell assay count data obtained from studies with these types of experimental designs 
(Figure [5] B). Unfortunately, the limited number of samples in the Fluidigm data prevented us from 
looking at co-expression involving more than two genes. In the case of more than two biomarkers, 
the number of parameters to estimate for our Dirichlet-multinomial model is 2 K+1 + 1, which is 
large even for moderate values of K. As an example, we would need both, a large number of subjects 
and a large number of events (cells) collected, to properly estimate the 33 parameters for K = 4. 
A solution would be to explore alternative model parameterizations that could be used to reduce 
the number of required parameters. For example, one could assume that the hyper-parameters are 
constant across biomarker combinations, i.e., ott StU \k = a{ s , u } f° r an k, and the number of param- 
eters would be reduced to 3 for any K. As attractive as this might sound, such a model would 
be unrealistic given that certain stimulations are known to induce expression of certain biomarkers 
more than others. More exploratory work will need to be done in this area once high dimensional 
single-cell level data with large number of samples become available. 

All of the results presented here were obtained with a software implementation of the EM and 
MCMC MIMOSA models in R and C++, and is freely available from GitHub 
( http : // www . github . org/ gf inak/MIMOSA) . An R package will soon be released as part of the 
Bioconductor project (|Gentleman et al" . 20041 ) . 



Supplementary Materials 

Web Appendices A, B, C, and Web Figures A and B referenced in Sections 121 and |3|3.31 and [5] are 
available in the attached Web-based supplementary material. 
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Web Appendix A: Computational details for the beta-binomial model 
Marginal likelihood derivations 

For a given subject i, the null marginal likelihood is obtained after integrating out the prior from the likelihood for 
model Mi), as follows, 



Lo{a u ,$ u \n si ,n ui ) = [ Pr(n si ,n ui \p u )%(p u \a u , p„)d/?„ 
Jo 

= / Pr{n s \p u )Pr{n u \p u )lz{p u \a Ul $ u )dp u 
Jo 



-Pu) N *- n ° ( N ")p n u(l P^^-Cr^V -Puf»- l dPu 
\n s J \n u J B(a u ,p u ) 

=( Ns ) ( Na )^rn / 1 ^ +n " +a "" 1 ( 1 -^) %+A '""" s """ +pM " ld ^ 

\n s J \n u J B(a„,P„) Jo 
N s \ fN„\ B(n s + n u + a u ,N s +N U - n s -n u + Po) 



y nJ\n u J B(a„,P„) 
Similarly, the alternative marginal likelihood for a given subject is defined as, 

-l pi 



Li(a H ,P„,a,,p,|n SI -,n HI ) = / / Pr(n J( - p s )n(p u , p s |a„, P„ , a s , $ s )dp s dp u 

Jo Jo 
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\nj \n u ) B(a H ,p„) B(a„p s ) Jo Pu { Pu) 
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N s \ ( N u \ B(n„ + a u ,N u - n u + p M )B(n. y + a s ,N s - n s + p,) 



n s )\n u ) B(a„,p„)B(a„p s ) 
MCMC algorithm 

In what follows, we use (x\y) to denote the conditional distribution of x given y. In particular, we use (x\ ■ ■ ■ ) to denote 
the distribution of x conditional on everything else in the model. Our MCMC algorithms cycles through the following 
steps: 
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1 . Update each a u , p„ , 0C S and p s by Metropolis-Hastings using a Gaussian symmetric proposal where the variance 
of the proposal is tuned for each parameter using the approach of Gelman et al. (2004); Raftery and Lewis 
(1992); Raftery (1996). 

2. Update w by Gibbs sampling using the full conditional, 

H-)~Beta(£ Zj ,£(l- Zj )). 

3. for each i, update z, by Gibbs sampling using the following full conditional, 

(z,|-)~B(l,p i ). 

where, 

w-Li(a B ,P„,oc,s,P 1 s|n B i,/i 1 ri) 

Pi 



w • Li (a„, p H , a. v , p, |n„,-, n si ) + ( 1 - w) ■ L (a„ , p„ 
For each updated parameter, step 1 above involves the calculation of the following acceptance ratio, (e.g. a u , below), 

L(af w ,p M; q,,p, < |...)7t(ar) 
L(a„,p„,ofe,p J |---)Jc(oc«) 

where L is the complete marginal likelihood conditional on z defined as, 

L(a„ , p„ , a s , p s | n„ , n s , z) = L Zi (a H , P„ , a 4 , p. s | n„ , n s ) 

i 

and 7t is the prior distribution of a„. The obvious changes in the above expression are made for the acceptance ratios 
of oc s , pj , p„ . In our case each parameter has the same exponential prior with mean 1 , 000 

Web Appendix B: Constrained beta-binomial model 

We can define a model where we constrain the stimulated proportions under the alternative model such that p s > p u . 
In this case, the only changes required are for the alternative marginal likelihood Li defined in the main manuscript 
by (1). Due to the constraint, the normalizing constant of the prior under the alternative (model M\) is not given by 
B(oc H , P H )B((Xj, p. v ) but requires computing 

Z(a„,p„,a„p. ( ) = C Pu^-'il-puf'- 1 f p^-\\-psf s - l dp s d Pu . 

Jo Jp u 

Using this expression, the constrained alternative marginal likelihood can be written as 

r / p oi n f N ui\ {Nsi\Z(n U j + a u ,N U i-n U i + $ u ,n S i + a s ,N S j -n si + $ s ) 
M(a„,p„,oc. s ,p s |y,-J = 



n ui )\n si ) Z(a„,p„,a. v ,p,) 

In general, there is no closed-form expression for Z(-), and a numerical approximation must be used. Let us denote 
by Z(a„, p„, a s ., p. s ) the approximation. A natural way to estimate Z is to use Monte Carlo integration. Indeed, we can 
write 

Z(a„,p H ,a. v ,p,) =B(oc M ,p M )B(a s ,p j ) /' j£ ~F "f" ' (1 - F as3s [p u ))dp u (1) 

where F as $ s is the cumulative distribution function of a beta random variable with parameters a s and p s . Using this 
identity, it can be seen that Z(a„, p„, a s ■, p. s ) can be approximated by 

K 

Z(a u , p„, a,, P.) w B(a„, p„)B(a. ( , p,) £ (1 - F asfis (X k )) 

k=l 

where the X^'s are iid beta distributed random variables with parameters a s , P. v and K is the number of terms used 
in the Monte Carlo approximation. This approximation works relatively well with our EM implementation and does 



2 



not significantly increase the computing time. Unfortunately, the number of terms (i.e. value of K) required for 
the approximation to be good might be large and computing such a normalizing constant at each iteration would 
significantly slow down our MCMC implementation. As it tuns out, a better approximation can be obtained when a s 
and p. s are integers. In this case, the cdf function in (1) can be calculated exactly using integration by parts, as follows, 

F r (d)~ Pl + V 1 (P.v + a.v-1)! , _ y %+a s -j 
Fa s $M- jm , + a s -;)! (1 P »> P » 

Then using this identity, we obtain 

Z(a„, p„, a s , p.) = B(a s , p/ + £ ' ^^f^ f\i- Pu f^i p ^^-i d p u 

7=p, y-iP« + u s J)- jo 

= B ( a -P.s) E ' _ j .;, B(P M +;)B(a„ + P, + a,-y). 

j=p s i-lP« + U v ])• 

Typically, in ICS data oc v is relatively small leading to relatively few terms in the sum. However, the use of this exact 
identity in our MCMC algorithm requires the use of discrete priors on oc s and p s , which can be restrictive in terms of 
fit (e.g., if the true oc s is less than one) and can render mixing in the MCMC more difficult. In addition, even though the 
computation is exact and much faster for small values of a s , which is typically the case with ICS data, it is still more 
demanding than the unconstrained model. In our case, we have decided to use the unconstrained model and simply 
fix the Zi to zero if the empirical proportion for the un-stimulated sample, p u , is greater than that of the stimulation 
sample, p s . Indeed, in the one-sided case, if p u > p s the associated individual should be a non-responder and thus 
Zi = 0. In our experience, this computational shortcut performs just as well as the true one-sided implementation while 
being computationally much less demanding. 

Web Appendix C: Computational details for the Dirichlet-multinomial model 
Marginal likelihood derivations 

Because our Dirichlet-multinomial is a direct extension of the beta-binomial model, the marginal likelihoods are 
obtained in the exact same fashion. For a given subject, the null marginal likelihood is defined as 



L (a H |n J( -,n Hi -) = J ■ J Pr(n J( -,n Hi -|p M )7t(p H |a H )dp„ 

\V>„) JO u 



. N,;' N„! 1 f j-j r n sik +n uik +a uk -\ r 

rU-Hy-*! Ukn u ik ] - B(a„) Jo y 
N s ! N„! B(n. Vi - + n„ + a M ) 



rLt«.H*! Uk n uik ] - B(a„) 
Similarly, the alternative marginal likelihood for a given subject is: 

N,! N„! B(n M ; + a„)B(n, ri + q,) 
FLt^it! Ilfc»«*! B(a„)B(a J ) 

MCMC algorithm 

The MCMC algorithm for the Dirichlet-multinomial model is analogous to the beta-binomial, above. The parameter 
vectors CL S ,U U are updated component- wise: 

1 . Update each a,^, a u t using a Gaussian symmetric proposal distribution with the variance of each proposal tuned 
using the approach of Gelman et al. (2004). 
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Web Figure A: Comparison of MIMOSA on other cytokines and cytokine combinations for ENV-1 -PTEG stimulated 
CD4+ T-cells from the HVTN065 trial. This figure appears in color in the electronic version of this article. 



2. Update w by Gibbs sampling using the full conditional, 

(H-")~Beta(E*,£(l-zO) 



3. For each i, update zi using the full conditional, 



where 



fe|-)~B(l,p,) 
w\,\{a ll ,a s \n si ,'a u 



w • Lj (a„ , a s \n si , n ui ) + (1 - w) ■ Lo(a„ |n„- , n H /) 
For each parameter component updated in step 1 above, compute the acceptance ratio (e.g. a u k, below): 

L(al™,a u{ _ k} ,a,\---)x(a™) 
h(a u ,a s \---)n(a uk ) 

where K is the prior distribution of the parameter, and aj,^^} = {a Ll j : j ^ k}. Again, we have used the same 
exponential prior with mean 1,000 for each parameter. L is the complete marginal likelihood conditional on z, 
defined as 

L(a„,a s |n Ji -,n 1(i -,z) = ]"[L Zi .(a„,a s |n«,n„ ( -)- 
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Web Figure B: Unconstrained MIMOSA model fit to data from a model violating model assumptions and to two-sided 
data with small counts. Data was simulated from a model where proportions were sampled from a truncated normal 
distribution over [0, 1] rather than a Beta distribution. A) The average ROC from 10 simulation with N=5,000 events. 
B) The average observed and nominal FDR from 10 simulations with N=5,000 events. C) Average ROC for N=1,000 
events D) Average observed and nominal FDR for N=1,000 events. MIMOSA fit to two-sided data were simulated 
from the standard model with N=1,000 events. E) Average ROC curves from 10 simulations and F) average observed 
vs nominal FDR from 10 simulations. This figure appears in color in the electronic version of this article. 
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